Towards Privacy-Preserving Query Log Publishing

نویسندگان

  • Li Xiong
  • Eugene Agichtein
چکیده

It’s an open secret that search engines collect detailed query logs, and sometimes release these data to third parties. While making this wealth of information available provides enormous opportunities for information retrieval and web mining research, it also raises serious concerns about the privacy of individuals. We strongly believe that this data should be published to allow researchers to develop new information access algorithms, however, it is desirable to anonymize these logs, so that they are still usable for research but do not contain sensitive information. The most important need is to define in a principled way the notion of privacy for query logs. This paper attempts to lay out some dimensions for defining privacy guidelines for query log publishing. We focus on the central issue of how to strike a balance between protecting the sensitive information and maintaining useful data for analysis. This work is within the overall vision of developing anonymization techniques to allow construction of IR algorithms (e.g., spelling correction) that maintain state-of-the-art performance over the anonymized data. We first describe some important applications of query log analysis and discuss their requirements on the degree of granularity of query logs. We then analyze the sensitive information in query logs and classify them from the privacy perspective. We lay out two orthogonal dimensions for anonymizing query logs and present a spectrum of approaches along those dimensions. We discuss whether existing privacy guidelines such as HIPAA can apply to query logs directly, or whether these guidelines require significant adaptation. For each of the approaches, we discuss the implications on query log utility regarding the important applications as well as the privacy of the anonymized query logs. More generally, our goal is to bring up questions and suggest challenges for privacy-preserving query log publishing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ppdp-mlt: K−anonymity Privacy Preservation for Publishing Search Engine Logs

In this paper we investigate the problem of protecting privacy for publishing search engine logs. Search engines play a crucial role in the navigation through the vastness of the Web. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and...

متن کامل

Layered Approach for Personalized Search Engine Logs Privacy Preserving

In this paper we examine the problem of defending privacy for publishing search engine logs. Search engines play a vital role in the navigation through the enormity of the Web. Privacy-preserving data publishing (PPDP) provides techniques and tools for publishing helpful information while preserving data privacy. Recently, PPDP has received significant attention in research communities, and sev...

متن کامل

Privacy Preserving Web Query Log Publishing: A Survey on Anonymization Techniques

Releasing Web query logs which contain valuable information for research or marketing, can breach the privacy of search engine users. Therefore rendering query logs to limit linking a query to an individual while preserving the data usefulness for analysis, is an important research problem. This survey provides an overview and discussion on the recent studies on this direction.

متن کامل

Access Control Friendly Query Verification for Outsourced Data Publishing

Outsourced data publishing is a promising approach to achieve higher distribution efficiency, greater data survivability, and lower management cost. In outsourced data publishing (sometimes referred to as third-party publishing), a data owner gives the content of databases to multiple publishers which answer queries sent by clients. In many cases, the trustworthiness of the publishers cannot be...

متن کامل

Towards Privacy Preserving Publishing of Set-valued Data on Hybrid Cloud

Storage as a service has become an important paradigm in cloud computing for its great flexibility and economic savings. However, the development is hampered by data privacy concerns: data owners no longer physically possess the storage of their data. In this work, we study the issue of privacy-preserving set-valued data publishing. Existing data privacypreserving techniques (such as encryption...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007